Biological Pattern Discovery with R Machine Learning Approaches (Zheng Rong Yang)

er 2 will focus on one responsive gene discovery problem. In

days, the essential gene discovery is mainly based on the high-

ut transposon sequencing technology. Based on this technology,

of mutants can be generated in one experiment. Genes essentiality

ed based on estimating a density function for a transposon statistic

the transposon insertions per gene statistic or the transposon

sites per gene statistic. This is a typical unsupervised learning

Therefore, various density estimation algorithms and cluster

algorithms will be introduced in this chapter. How they are used

sential gene discovery will be demonstrated.

er 3 will focus on the peptide pattern discovery problem. This

is a type of applications where protein functional sites are

ed using local protein structures, i.e., protein peptides. The

projects of this problem mainly include the discovery of protease

sites or post-translational modification sites in peptides. The

ea also applies to the discovery of the DNA binding sites,

ion factor sites, etc. Protein functional site discovery employs

of peptides, i.e., those having no functional sites and those having

l sites verified in laboratories. The latter refers to the protease

peptides or the posttranslational modified peptides. This type of

ons fits the mainstream in machine learning, i.e., classification

or discriminant analysis. This chapter will introduce various

tion analysis algorithms and demonstrate how these algorithms

ed for peptide pattern discovery. A typical problem of the protein

l site discovery is the data type. A protein sequence is a string of

cids, which are non-numerical. Therefore this chapter will

different encoding approaches for handling amino acids in

so that a machine learning algorithm, which needs numerical data

put, can be used.

er 4 will focus on the genetic-epigenetic interplay pattern

y problem. The genetic signatures stand for genes and the

c signatures stand for methylation sites, DNA copy number, etc.

n objectives of this kind of research include two types of the